Extend dynamic filter pushdown to Left and LeftSemi hash joins#20447
Extend dynamic filter pushdown to Left and LeftSemi hash joins#20447helgikrs wants to merge 2 commits intoapache:mainfrom
Conversation
|
|
||
| fn allow_join_dynamic_filter_pushdown(&self, config: &ConfigOptions) -> bool { | ||
| if self.join_type != JoinType::Inner | ||
| if !matches!(self.join_type, JoinType::Inner | JoinType::Left | JoinType::LeftSemi) |
There was a problem hiding this comment.
Can JoinType::Right and JoinType::RightSemi also be applied based on the same rationale?
There was a problem hiding this comment.
I don't think so. The build side is always the left side and the probe side is always the right side. The filters are therefore always pushed to the right side. We need to retain all rows from the right side for a Right join, so there's nothing we can push down.
A right join query can still take advantage of this if the the optimizer decides to swap the sides to build the right side to the join, in that case it would become a left join here.
There was a problem hiding this comment.
Actually I think it might be safe to add RightSemi and LeftAnti to this list.
The dynamic filter only removes probe rows that can't match any build row, and neither RightSemi (which outputs only matched probe rows) nor LeftAnti (which outputs only unmatched build rows) includes unmatched probe rows in its output. So I think it would be safe to add those here as well. Right and RightAnti would not be.
7cf1bc6 to
4e40891
Compare
4e40891 to
0dac298
Compare
The dynamic filter from HashJoinExec was previously gated to Inner joins only. Left and LeftSemi joins have the same probe-side filtering semantics. PR apache#20192 refactored the join filter pushdown infrastructure, which makes extending self-generated filters to Left/LeftSemi join types trivial.
0dac298 to
1ea0252
Compare
|
Thanks @helgikrs, all tests have passed. I'll try to review it carefully tomorrow. |
nuno-faria
left a comment
There was a problem hiding this comment.
LGTM, I couldn't find an edge case where it fails. I think other join types can also be added, but this can be done in a future PR.
| if !matches!( | ||
| self.join_type, | ||
| JoinType::Inner | JoinType::Left | JoinType::LeftSemi | ||
| ) || !config.optimizer.enable_join_dynamic_filter_pushdown |
There was a problem hiding this comment.
I think it would also work with LeftAnti, but it doesn't need to be included in this PR.
Also, it would be interesting to explore RightSemi as well, since some queries will opt to use it if the statistics are not accurate. For example:
-- t1 >> t2
-- uses LeftSemi: dynamic filter added
select *
from t1
where t1.k in (select k from t2)
and t1.v2 in (123);
-- uses RightSemi: no dynamic filter
select *
from t1
where t1.k in (select k from t2)
and t1.v2 in (123, 456);Once again, both cases can be considered in a future PR.
|
@adriangb please take a look as well if you have the chance. |
adriangb
left a comment
There was a problem hiding this comment.
Looks good to me!
I wonder - should we use on_lr_is_preserved to determine what is safe or not?
The dynamic filter from HashJoinExec was previously gated to Inner joins only. Left and LeftSemi joins have the same probe-side filtering semantics.
PR #20192 refactored the join filter pushdown infrastructure, which as a side effect makes extending self-generated filters to Left/LeftSemi join types trivial.
Which issue does this PR close?
This PR makes progress on #16973
Rationale for this change
The self-generated dynamic filter in HashJoinExec filters the probe side using build-side values. For Left and LeftSemi joins, the right-hand probe side has the same filtering semantics as Inner. Relaxing the gate to take advantage of this optimization for Left and LeftSemi joins.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.